Assessing ascertainment bias in atrial fibrillation across US minority groups

The aim of this study is to define atrial fibrillation (AF) prevalence and incidence rates across minority groups in the United States (US), to aid in diversity enrollment target setting for randomized controlled trials. In AF, US minority groups have lower clinically detected prevalence compared to the non-Hispanic or Latino White (NHW) population. We assess the impact of ascertainment bias on AF prevalence estimates. We analyzed data from adults in Optum’s de-identified Clinformatics® Data Mart Database from 2017–2020 in a cohort study. Presence of AF at baseline was identified from inpatient and/or outpatient encounters claims using validated ICD-10-CM diagnosis algorithms. AF incidence and prevalence rates were determined both in the overall population, as well as in a population with a recent stroke event, where monitoring for AF is assumed. Differences in prevalence across cohorts were assessed to determine if ascertainment bias contributes to the variation in AF prevalence across US minority groups. The period prevalence was respectively 4.9%, 3.2%, 2.1% and 5.9% in the Black or African American, Asian, Hispanic or Latino, and NHW population. In patients with recent ischemic stroke, the proportion with AF was 32.2%, 24.3%, 25%, and 24.5%, respectively. The prevalence of AF among the stroke population was approximately 7 to 10 times higher than the prevalence among the overall population for the Asian and Hispanic or Latino population, compared to approximately 5 times higher for NHW patients. The relative AF prevalence difference of the Asian and Hispanic or Latino population with the NHW population narrowed from respectively, -46% and -65%, to -22% and -24%. The study findings align with previous observational studies, revealing lower incidence and prevalence rates of AF in US minority groups. Prevalence estimates of the adult population, when routine clinical practice is assumed, exhibit higher prevalence differences compared to settings in which monitoring for AF is assumed, particularly among Asian and Hispanic or Latino subgroups.


Health disparities in the US
Population subgroups that face health disparities are often underrepresented in clinical research, including clinical trials [1][2][3][4].The bias introduced by underrepresentation of certain populations in randomized controlled trials (RCTs) can result in the inability to apply trial findings to the underrepresented patient subgroups.These understudied populations may experience a different response to the intervention of interest due to variation in disease-specific risk factors, disparities in health care access or pathogenesis differences [5].

Underrepresentation in clinical trials
In the United States, members of racial and ethnic minority groups have historically experienced inequities in healthcare access and quality; and continue to be underrepresented in clinical trials [6].Race and ethnicity are proxies for populations that have been historically undertreated or mistreated due to issues with access to care, provider mistrust, socioeconomic status and historic injustices.The importance of representation of minority groups in RCTs is underscored by the passing of the National Institute of Health (NIH) Revitalization Act by Congress in 1993 [7].This law encourages enrollment of underrepresented populations in RCTs with the goal of increased representation in clinical research and evidence generation.Despite these efforts, ensuring representation and reducing racial and ethnic disparities in clinical research continues to be a challenge across many therapeutic areas [3,[8][9][10][11][12].
Recent guidelines published by the United States (US) Food and Drug Administration (FDA) expand on methods and approaches for increasing representation in RCTs, with the goal of reducing racial and ethnic health disparities [13].These approaches include establishing subgroup enrollment targets, increasing access to RCT participation, and broadening eligibility criteria.Having more accurate estimates of incidence and prevalence in underrepresented populations can better inform target enrollment goals for these patients.However, setting appropriate enrollment targets for RCTs is challenging when the underlying incidence and prevalence of diseases of interest among racial and ethnic subgroup populations is not well estimated.The FDA suggests leveraging real-world data (RWD) to estimate the incidence and prevalence of health outcomes within racial and ethnic groups, to develop diversity enrollment targets in RCTs [14].RWD provides an opportunity to assess racial and ethnic differences and disparities in healthcare delivery and outcomes; these epidemiologic findings will allow for real-world representative enrollment targets, compared to simpler overall demographic estimates [15][16][17].

Assessing incidence and prevalence estimates across racial and ethnic groups with RWD
When assessing racial and ethnic differences in incidence and prevalence in RWD, the aim is to characterize true differences in disease presentation amongst subgroups and exclude differences that originate from healthcare delivery bias.Patient level healthcare data are not a random sample from the overall population and tend to be biased towards patients that have access to healthcare.One such example is ascertainment bias, a sampling bias that can result from differences in healthcare access, provider bias, and healthcare system behaviors.This is particularly true for, but not limited to, insurance claims based RWD, where communities which historically have been subject to health disparities can be under-or misrepresented [18][19][20].Ascertainment bias can be introduced by different people during a healthcare intervention, from the person receiving the intervention to the person administering the intervention.
purchase by contracting with the database owner, Optum (contact at: https://www.optum.com/business/life-sciences/real-world-data/claims-data.html).The authors did not have any special access privileges that other parties who license the data and contract with Optum would not have.Groups that are under-ascertained may be less likely to seek or receive care, even when comparably insured, due to other factors such as accessibility, language, cultural and knowledge barriers, systemic racism, and discrimination.Additionally, even when they do seek care, these groups can still be misdiagnosed due to provider bias [21].

Atrial fibrillation (AF) in racial and ethnic groups in the US
AF prevalence and incidence is growing due to the aging population, increased risk factors, and improved detection [22].AF is often diagnosed when patients present with symptoms or incidentally during routine medical examinations, through systematic screening, or following a medical event (e.g., ischemic stroke).Early detection of AF is critical to reduce the risk of complications; however, AF is often paroxysmal (intermittent), and diagnosis remains challenging.Numerous observational studies have studied risk, incidence, and prevalence estimates of AF across racial and ethnic groups [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39].Although Black or African American populations show higher rates of traditional cardiovascular risk factors the incidence and prevalence rates of AF remain lower when compared to NHW populations.This contradiction is often referred to as the 'AF paradox' [40].While the low prevalence of AF in the Black African American population has been thoroughly studied, with several proposed hypotheses for this phenomenon ranging from ascertainment bias to genetic and environmental risk factors, limited research has been conducted to examine the lower AF incidence and prevalence rates within the Asian and Hispanic or Latino population.
Generally, studies can be divided into two groups based on AF detection method.The clinical-based detection group refers to studies where AF cases were identified with routine clinical methods.This includes validating the AF diagnosis by ECGs obtained during protocol-driven events or at study baseline, through hospitalization records, EHR, administrative claims, discharge codes, death certificates, or self-reporting by the patient.The monitoring-based detection group refers to studies where AF cases were identified through ambulatory monitoring, e.g., 48-hour or 14-day ambulatory ECG recordings or when ECG monitoring is presumed through clinical events, e.g., patients with a pacemaker or patients who experienced a stroke event.A detailed overview of studies in terms of the enrollment period, enrolled population, and AF detection method is available in S1 Table .This study aims to estimate the incidence and prevalence within racial and ethnic groups for atrial fibrillation using a large-scale US-based claims RWD source, to support evidencebased diversity enrollment target setting.The incidence and prevalence measures are explored both in the overall adult population, as well as a targeted population that has been systematically tested for AF after an ischemic stroke event.The hypothesis of this study is that populations undergoing systematic AF screening for medical cause will provide incidence/prevalence estimates closer to the "true" disease occurrences.Additionally, we aim to better understand the impact of ascertainment bias on prevalence rates by computing prevalence ratios and relative differences across racial and ethnic groups, both in a clinical-based and monitoring-based detection setting in claims data.

Study design and setting
This non-interventional cohort study was designed to evaluate the annual incidence and prevalence of AF in an overall patient population as well as a subset with a history of ischemic stroke.Annual incidence was evaluated for each calendar year from 2017-2020.Prevalence was evaluated for the period 2017-2020.The study used Optum's de-identified Clinfor-matics1 Data Mart Database (CDM) from 1 January 2016 to 31 December 2020.This US-based longitudinal dataset consists of administrative health claims for approximately 70 million de-identified patients sourced from a large national managed care company in the US, and includes data on member eligibility and demographics, inpatient and outpatient claims, outpatient lab tests, and socioeconomic information.The patient's name and geography are used by the data vendor to map the patient to one of five race and ethnicity categories: Asian, Black or African American, Hispanic or Latino, non-Hispanic or Latino White or Unknown/ Other, aligned with the U.S. OMB (U.S. Office of Management and Budget standards) classifications.CDM data only contains de-identified health information as described by the HIPPA Privacy Rule.No direct identifiers of individuals or providers are included.The use of the CDM does not involve human subjects research and is exempt from institutional review board approval.

Population
The overall study population included all adults enrolled in CDM between 2017 and 2020.For annual incidence and prevalence, cohort eligibility was determined for each calendar year of study.Patients were required to be 18 years or older on Jan 1 of the calendar year under study and have continuous enrollment for the year of interest and 12-months prior.To determine an overarching period prevalence for 2017-2020 prevalence, patients were required to be at least 18 years of age on the date they met the baseline enrollment requirements for the 2017-2020 cohort.Study diagrams are provided in S1-S4 Figs.
The history of stroke subgroup required an inpatient medical claim with a primary diagnosis of stroke (ICD-10: H34.1, I63*, I64*) [41] between 1 October of the year prior to the year of interest and 30 September of the year of interest for the year over year cohorts, or between 1 October 2016 and 30 September 2020 for the 2017-2020 full period cohort.Additionally, the stroke event must have preceded the AF diagnosis (or occurred on the same day).Index date was defined as 1 January for the year over year cohorts, and as the first day that patients met the age and baseline enrollment requirements for the 2017-2020 full period cohort.This cohort was used as a proxy for a population that underwent systematic testing of AF within RWD as people who have an ischemic stroke are screened for AF with an ambulatory electrocardiography device such as a Holter monitor.

Outcome and covariate definitions
In the CDM, race and ethnicity are combined in a single construct as Asian, Black, Hispanic, and non-Hispanic White.However, it is noted that race encompasses a broad set of social constructs influenced by societal, ancestral, and geographic factors, while ethnicity commonly refers to cultural background, often with a shared language or religion.As such, although there are diverse perspectives on preferred terminology, the emphasis of the study lies in understanding the impact of this covariate on health outcomes [42].Other covariates of interest include the CDM-defined categories of education, occupation, net worth, family income, and enrollment.Missing or unknown were grouped together into one missing/unknown category per covariate.Demographic characteristics were assigned based on last value recorded prior to cohort entry.
AF events required at least 2 outpatient claims occurring at least 7 days to at most 365 days apart, or 1 inpatient claim containing ICD-10 diagnosis code I48* to be defined as having AF.Patients who met the outpatient definition were classified as having AF on the date of their second outpatient diagnosis.Patients who met the inpatient definition of AF were classified as having AF on the date of their inpatient admission [43].
Comorbid conditions of interest included obesity, hypertension, diabetes, cardiovascular disease, COPD, renal disease, and liver disease and were defined using previously validated ICD-10-CM diagnosis algorithms using inpatient and/or outpatient encounters.Comorbid conditions were assessed over the baseline period.Healthcare utilization was defined as the number of days with any medical or pharmacy claim in the year prior to index date, and patients were categorized by quartile.The CHA 2 DS 2 -VASc score was evaluated as a categorical variable based on the European Society of Cardiology guidelines categorizing the CHA 2 DS 2 -VASc score into three categories: 0,1, and �2 [44].

Statistical analysis
Distribution of demographic characteristics and comorbidities were assessed at time of cohort entry as mean (SD) or median (interquartile range [IQR]) for continuous variables and proportions (n, %) for categorical variables.Incidence and prevalence of AF were calculated annually for each year from 2017 to 2020, along with overall period prevalence in 2017-2020.Incident AF was defined as a new diagnosis of AF meeting the study algorithm definition.The incidence washout period was defined as the start of all available data until the 31 st of December preceding the year of interest.Prevalent AF was defined as any diagnosis of AF meeting study criteria occurring during the time period of interest, without consideration of washout period.Incidence and prevalence calculations were performed for the overall cohort and within race and ethnicity strata.Prevalence estimates were further stratified by hypertension status, healthcare resource utilization (HCRU) quartile, and CHA 2 DS 2 -VASc score.
To assess the degree of potential over/under-ascertainment of AF among patients by racial group, the ratio between the prevalence of AF among highly monitored patients (i.e., those with a recent history of ischemic stroke) versus prevalence in the overall population was assessed.The equation used to determine the prevalence ratio within strata was: Additionally, the relative difference between the prevalence of AF in the NHW population is computed in relation to the prevalence of AF in other racial and ethnic populations.The equation used to determine the relative prevalence difference was: relative difference ¼ prevalence in target group À prevalence in nonÀ Hispanic or Latino White prevalence in nonÀ Hispanic or Latino White Relative prevalence differences are computed in both the overall and stroke population.Incidence and prevalence estimates were compared in different subgroups to explore potential ascertainment bias.Any observed differences in the overall cohort may be the result of differential testing and could point to potential ascertainment bias.Differences in the stroke cohort, following an acute event, are assumed to be closer to true differences, not originating from biases in healthcare delivery.All analyses were conducted using the Aetion Evidence Platform (2022).

Overall population
The CDM population consisted of 42,392,287 patients from 2017 to 2020; 49% are male, 59% non-Hispanic or Latino White, 12.3% Hispanic or Latino, 9.9% Black or African American, and 4.9% are Asian.Approximately 39% of the population are aged 18-44 and nearly 25% of the population are aged 45-64 as shown in Table 1.

Annual incidence
Approximately 7.5 to 8.6 million patients met the overall annual incidence cohort eligibility criteria each year.Of those, 107,864 to 142,507 patients met the stroke cohort eligibility criteria each year.Table 2 shows that the annual AF incidence in the overall cohort ranged from 10.64 (10.57, 10.71) to 11.87 (11.79, 11.94) per 1,000 person-years (PY) between 2017-2020.In the cohort of patients with a recent stroke over the same period of time, the incidence of AF ranged from 66.06 (64.66, 67.45) to 71.95 (70.48, 73.43) per 1,000 PY.The incidence of AF in the overall cohort and stroke cohort was highly stable from 2017-2019, with year-over-year differences of less than 1.23 per 1,000 PY.In 2020, the incidence of AF among all patients decreased from the prior year's estimates by 0.5% and 10.4%, potentially due to COVID-19 pandemic-related changes to healthcare delivery and utilization.Similar AF incidence reductions were observed among patients with a recent ischemic stroke in 2020.Demographic and clinical characteristics did not differ year over year in the incident AF populations.The majority of incident AF patients were non-Hispanic or Latino White, had an income <$40,000 USD per year, had a history of hypertension, and had a mean age of ~74 years old.After stratifying by race, the annual AF incidence among all patients was highest for NHW patients (11.87-13.23),followed

Annual prevalence
Approximately 7.9 to 9.2 million patients met the eligibility criteria for the overall population for the estimation of yearly prevalence over 2017-2020, and 119,154 to 159,761 patients met the eligibility criteria for the population of patients with a recent history of ischemic stroke.Between 2017-2020 the prevalence of AF ranged from 4.71% to 5.58% (Table 3).Among patients with a recent stroke, 2017-2020 annual prevalence of AF ranged from 25.88% to 27.27%.The prevalence of AF increased steadily from 2017-2019, with an overall change of 0.87%.In 2020, the prevalence of AF among all patients decreased from the prior year's estimates by 0.5% and 10.4% respectively, potentially due to COVID-19 pandemic-related changes to healthcare delivery and utilization.Similar reductions were observed in the prevalence of AF among patients with a recent ischemic stroke in 2020.Demographic and clinical characteristics of patients identified as having prevalent AF are reported in Table 4.The majority of prevalent AF patients were NHW, had an income <$40,000 USD per year, had a history of hypertension, and had a mean age of ~74 years old.After stratifying by race, the prevalence of AF was highest for NHW patients (5.32% to 6.40%), followed by Black or African American (4.18% to 4.96%) and Asian (2.80% to 3.40%) patients, and lowest for Hispanic or Latino patients (1.98% to 2.27%), as shown in Table 3.In the recent ischemic stroke cohort, the relative prevalence differed, with Asian patients had the second highest prevalence of AF (after NHW patients) in 2017, 2019, and 2020.

Prevalence ratio estimated from annual prevalence
To assess the degree of potential over/under-ascertainment of AF among patients by racial group, the prevalence ratio between the prevalence of AF among highly monitored patients (i.e., those with a recent history of ischemic stroke) versus prevalence in the overall population was assessed.The largest prevalence ratios were observed among Asian and Hispanic or Latino patients (6.74 to 7.63 for Asian patients; 9.55 to 10.92 for Hispanic or Latino patients-Table 3).For these patients, the prevalence of AF among the recent stroke population was approximately 7 to 10 times higher than the prevalence among the overall population (compared to a prevalence ratio approximately 5 times higher for NHW patients), suggesting that AF may be under-ascertained in the overall population for Asian and Hispanic or Latino patients.

Period prevalence 2017-2020
The period prevalence for 2017-2020 was 5.09% in the overall population and 29.72% in the stroke population (Table 5).In both the overall population and the recent stroke population NHW patients had the highest period prevalence with 5.89% and 32.21% respectively.The Asian patients had the lowest prevalence in the overall population with a prevalence of 2.08% while the Black or African American patient population had the lowest prevalence in the stroke patient population at 24.26%.

Prevalence ratio estimated from period prevalence 2017-2020
The largest prevalence ratios were observed among Asian and Hispanic or Latino patients (7.88 and 11.75 respectively-Table 5), for whom the prevalence of AF among the recent stroke population was approximately 7 to 11 times higher than the prevalence among the overall population (compared to a prevalence ratio approximately 5 times higher for NHW patients), suggesting that AF may be under-ascertained in the overall population for Asian and Hispanic or Latino patients.The largest prevalence ratio was seen in the no hypertension subgroup with the lowest prevalence ratio in NHW patients (13.90) and the highest prevalence ratio in the Asian (32.70) and Hispanic or Latino (23.06) patients.This is in sharp contrast to the patterns seen in the hypertension subgroup where the prevalence ratio across racial and ethnic groups ranged from 2.63 among NHW patients to 3.81 among Asian patients.In the HCRU quartile stratified results, the largest prevalence ratios was seen in the first HCRU quartile with a range of 20.65 among NHW patients to 63.57 among Asian patients while the lowest prevalence ratios was seen in the fourth HCRU quartile where the prevalence ratio across racial and ethnic groups ranged from 2.07 among Black or African American patients to 3.53 among Asian patients.
In the results stratified by CHA 2 DS 2 -VASc score categories the largest overall prevalence ratio was in the CHA 2 DS 2 -VASc score of 0 group (17.75)where the lowest prevalence ratio was among NHW patients (15.35) and the highest prevalence ratio was observed among Asian and Hispanic or Latino patients (43.60 and 27.05 respectively).The prevalence ratio decreases as the CHA 2 DS 2 -VASc score increases with the lowest prevalence ratio seen in the CHA 2 DS 2 -VASc score �2 subgroup (range 2.56 among Black or African American patients to 4.05 among Asian patients).

Relative difference in period prevalence 2017-2020
To allow for a comparison of the results with prior observational studies, the relative difference of 2017-2020 period prevalence in NHW population is computed in relation to the other racial and ethnic populations in Table 6.Among the overall population, the greatest magnitude of relative difference was seen in the Hispanic or Latino population (64.6%), followed by the Asian (46.2%) and the Black or African American (16.3%) population.Relative differences in period prevalence change in the stroke population (respectively, 24.7%, 24.1% and 22.4% for Black or African American, Hispanic, and Asian populations) and relative differences across racial and ethnic populations are more similar compared to the overall population.

Discussion
This study explores prevalence and incidence of AF in two cohorts: one cohort reflecting standard clinical practice in the US, and the other mimicking a population subjected to AF monitoring or screening.Unlike previous observational research, which tends to focus on either clinical practice or monitoring separately, this study investigates both settings within the same population.This approach is key to assessing the impact of ascertainment bias quantitatively across subgroups.
The investigation is performed on a large-scale claims database (CDM) which represents the demographics of the United States across various geographical regions and insurance categories.In contract, other observational studies often target smaller, and regionally or demographically limited populations.Additionally, the study's scope encompasses an analysis of Asian and Hispanic or Latino subgroups, which historically have been less studied in US observational AF studies.
The findings of this study substantiate prior hypotheses regarding racial and ethnic variations in AF prevalence.Comparing, for example, annual and period AF prevalence in the CDM population with prior observational studies, we identified a consistent pattern.In both cohorts, the Black or African American, Hispanic or Latino, and Asian population have a lower AF prevalence compared to the NHW population.However, the observed differences are assumed to be attenuated by ascertainment bias and are expected to be less pronounced than initially estimated, particularly in the case of Asian and Hispanic or Latino populations.
Our findings show the greatest prevalence ratios across the overall and stroke cohort were observed among Asian and Hispanic or Latino patients, suggesting that AF may be underascertained in the overall population for these patients.In the social determinants of health (SDOH) subgroups, the greatest prevalence ratios was observed in the 'no hypertension' subgroup, the first HCRU quartile and the group with CHA 2 DS 2 -VASc score of 0 with a consistent pattern of the lowest prevalence ratio in NHW patients and the highest prevalence ratio in the Asian and Hispanic or Latino patients.The most significant disparity in ascertainment of AF is thus observed among Asian and Hispanic or Latino patients, overall and within subgroups characterized by good overall health (HCRU quartile 1) and a minimal presence of AF risk factors.For NHW and Black or African American populations with comparable health statuses, the gap in AF ascertainment is less pronounced compared to Asian and Hispanic or Latino patients.However, it is essential to recognize that even within these populations, disparities in AF diagnosis and detection may still exist, warranting further investigation.To facilitate comparisons between the various studies and our results, Fig 1 presents the relative prevalence difference of the NHW subgroup in relation to other racial and ethnic subgroups.Prevalence estimates across racial and ethnic subgroups from the different observational studies that used clinical detection methods are described in S2 Table .The focus is on prevalence results as incidence rates are not frequently reported in prior studies.It is important to note that variations in the definitions of racial/ethnic groups exist, as well as differences in the patient populations included across the observational studies.Most observational studies target the overall adult population, yet some target populations with prior comorbidities, specific age groups or ethnicities, resulting in a broad range of results [26,28,31,32].
Comparing across studies, the Black or African American, Hispanic or Latino and Asian AF prevalence is estimated to be respectively 44%, 50% and 43% lower than the AF prevalence for the NHW subgroup (average reported across respectively 16, 9 and 7 studies).For the Asian and Hispanic or Latino population, the findings of the overall adult cohort from this study, which relies on clinical based detection, align with the findings of prior studies, and highlight similar differences in prevalence of 46.2% and 64.6% respectively.The prevalence difference of the Black or African American population is estimated lower than what is observed in prior studies at 16.3%.Although prevalence differences vary driven by study design and population, the results indicate that minority subgroups consistently have a lower AF prevalence than NHW subgroups.
To better understand and address the AF paradox, several observational studies have been conducted, employing monitoring-based methods for the diagnosis of AF.An overview of AF prevalence across these studies is provided in S3 Table .Comparing across studies, the Black or African American, Hispanic or Latino and Asian AF prevalence is estimated to be respectively, 23%, 3% and 27% lower than the AF prevalence for the NHW subgroup (average reported across respectively number of studies 3, 1 and 1 studies).The variability in detection of AF by diagnosis method underscores potential disparities in observational data, as studies where patients were systematically screened for AF using monitoring-based methods, such as MESA [25] and a pacemaker-based study [33], showed lower prevalence differences among subgroups.Not all studies address all races and ethnicities, and significant differences exist across studies.The findings of the stroke cohort from this study, which assumes close monitoring of the patients, are directionally similar, with a prevalence difference of respectively, 24.7%, 24.1% and 22.4.% for Black or African American, Hispanic or Latino, and Asian populations.While the prevalence gap for the Asian and Hispanic or Latino population decreases by more than 50%, it increases for the Black or African American population.This could be ascribed to differences in ischemic stroke risk factors, etiology and outcomes in Black or African American patients [45].
The number of studies focused on racial/ethnic differences in AF monitoring is limited, especially for Asian and Hispanic or Latino populations.When comparing the relative prevalence differences of the NHW subgroup with Black or African American, Hispanic or Latino, and Asian subgroups together with the results from past observational studies, a difference is observed based on the method of AF detection employed in the studies.Specifically, results that relied on clinical-based detection methods, report higher relative prevalence differences than results that relied on monitoring-based detection methods.Studies utilizing continuous monitoring procedures or events that assume monitoring tend to reveal lower relative prevalence differences among these subgroups.This would suggest that when standard clinical assessments are used for AF diagnosis, differences in prevalence among racial and ethnic subgroups are overestimated due to ascertainment bias.
Despite the method used for AF diagnosis, significant differences in prevalence remain, indicating that factors other than ascertainment bias are also likely contributors.We conclude that while ascertainment bias is unlikely to explain all differences in AF prevalence observed by race and ethnicity, current clinical estimates, whether due to access to care or testing approaches, are less accurate in reflecting the 'true' burden of AF within Asian and Hispanic or Latino patient populations.Factors that can contribute to ascertainment bias of AF in these racial and ethnic groups include language barriers, socioeconomic factors, cultural barriers, and differences in healthcare-seeking behavior.These findings underscore the importance of not only the diagnostic methods employed when assessing AF prevalence, but also the importance of patient education, culturally competent care, and improved and equal access to treatment and care including appropriate screening and diagnostic protocols.
There is an immediate opportunity to apply these findings in the clinical trial setting.When using RWD in designing a trial, it is imperative we proactively account for data biases arising from known health disparities.For example, when establishing evidence-based diversity enrollment targets, it is critical to exercise caution and address potential ascertainment bias that can differentially affect minority groups.In the specific context of AF RCTs, ascertainment bias can introduce disparities in the identification, diagnosis, and enrollment of individuals from different racial and ethnic backgrounds.Acknowledging and addressing ascertainment challenges can improve diverse enrollment in AF RCTs.In turn, this will contribute to generating robust evidence for populations affected by AF.

Limitations
This study examines ascertainment bias in AF, and the study design controls for such bias, by contrasting the overall population with a cohort of patients with a recent ischemic stroke, for which AF monitoring is part of stroke guidelines [46].The study relies on claims data to characterize differences, as opposed to observational studies that rely on active monitoring-based diagnosis methods for AF to assess the impact of ascertainment bias in AF differences across racial and ethnic groups.
There are several potential limitations to consider when using real-world administrative data to assess prevalence and incidence for underrepresented groups.This study relies on the use of secondary data collected for administrative, and not research, purposes.Results from this study depend on the accuracy of diagnostic codes used in the diagnosis of a health state.There is the potential for miscoding in the atrial fibrillation and ischemic stroke events and subsequent risk of misclassification bias in patient records due to provider coding patterns (e.g., using diagnosis codes to indicate rule-out criterion) or incorrect coding (e.g., data entry errors), which may lead to misclassification of diagnoses or patient characteristics.The algorithms identified for the current study have been previously validated with a focus on precision [41,43].However, the calendar time components used in some of the algorithms introduce the potential for undercounting people with the outcome of interest, particularly if there are issues with continuity of care over the course of the 2-year timeframe.Using a requirement of continuous enrollment in the baseline period could result in selection bias and underreporting of events.Gaps in coverage could also contribute to underreporting.These limitations could result in an underestimation of the incidence and prevalence of these outcomes.
Additionally, the use of derived race and ethnicity information in the CDM is also likely to be less accurate compared to self-reported data.Prior research has shown that NHW and Black or African American individuals are more likely to be correctly identified in healthcare data, while individuals who identify as Asian, Hispanic or Latino, or multi-racial backgrounds are often inaccurately categorized as White [18].As such, misclassification of demographics such as race, ethnicity, and other factors used in the analysis cannot be ruled out, nor is it possible to make assumptions on AF prevalence and incidence patterns for patients in the missing/unknown category.
Another limitation of this study is the factors contributing to health disparities in atrial fibrillation further downstream from diagnosis.These include disparities in access to care, disparities in diagnosis patterns, and disparities in health coverage.The data used for this study will not be able to assess the contributions of these downstream factors to the incidence and prevalence of the outcomes.However, the data presented from this study will reflect the realworld experiences of insured patients in the US.
Finally, in the study design, it is presumed that stroke guidelines are closely adhered to, especially the active monitoring of patients following an ischemic stroke event.Although the use of monitoring procedure codes was also considered in the cohort design, their implementation was excluded as limited validation is available for these monitoring procedure codes.It is also important to consider that, as our second cohort only consists of patients with a past ischemic stroke event, it might not be representative of a general population.However, to support the generalizability of our study, we included the patient characteristics of both cohorts in Table 1 and investigated the etiology of ischemic strokes.The NHW have a significantly greater proportion of cardioembolic stroke than the Hispanic or Latino and Black or African American population, which is the leading stroke subtype caused by atrial fibrillation and/or flutter.For other stroke subtypes associated with AF, such as cryptogenic stroke, proportions were not significantly different among the 3 racial/ethnic groups [45].As the proportion of AF associated stroke subtypes is largest for the NHW subgroup, the reduction in AF prevalence gap between subgroups is less likely to be driven by differences in ischemic stroke etiology.

Conclusion
Racial and ethnic minority groups face significant disparities in healthcare, including underrepresentation in clinical trials.Underrepresentation can hinder the application of trial findings to these populations.The FDA has published guidelines to increase representation in trials and reduce racial and ethnic health disparities by setting enrollment targets for underrepresented groups.Real-world data (RWD) is valuable in estimating disease incidence and prevalence among racial and ethnic subgroups to inform these enrollment targets.
However, assessing racial and ethnic differences in RWD requires distinguishing true disease differences from biases originating from healthcare delivery, such as ascertainment bias.This study utilizes US claims RWD to estimate incidence and prevalence of atrial fibrillation (AF) within racial and ethnic groups, considering both the overall adult population and a highly monitored population to evaluate ascertainment bias.
The study findings align with previous observational studies, revealing lower incidence and prevalence rates of AF in US racial/ethnic minority groups.However, a key element influencing the reported prevalence differences is the choice of AF diagnostic methods.Specifically, prevalence estimates derived from routine clinical-based detection methods exhibit higher relative prevalence differences compared to monitoring-based detection methods, particularly among healthy Asian and Hispanic or Latino subgroups.
These findings emphasize the importance of considering the diagnostic method when assessing AF prevalence.Similarly, addressing data biases linked to health disparities, is crucial in the use of real-world evidence to define recruitment targets for underrepresented groups.

Funding:
The authors received no specific funding for this work.LH, KH, JH, HV, YM & KS are fulltime salaried employees of Janssen Research & Development, a pharmaceutical company of Johnson & Johnson.AB, CJ & AT are full-time salaried employees of Aetion, Inc.The specific roles of these authors are articulated in the 'author contributions' section.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Competing interests: LH, KH, JH, HV, YM & KS are employees of Janssen Research and Development, a unit of Johnson and Johnson family of companies.The work on this study was part of their employment.AB, CJ & AT & are paid employees of and shareholders in Aetion, Inc., a company that makes software for the analysis of real-world data.This does not alter our adherence to PLOS One policies on sharing data and material.

Fig 1 .
Fig 1. Overview of relative AF prevalence difference of Asian, Black or African American, and Hispanic or Latino subgroup versus the NHW population across observational studies that used clinical detection methods.https://doi.org/10.1371/journal.pone.0301991.g001

Table 2 . Annual incidence of atrial fibrillation among all US adults vs. US adults with a recent history of ischemic stroke, 2017-2020. Number of patients at-risk All Adults Incidence Rate (per 1000 PY) All Adults Number of patients at-risk Recent Stroke
https://doi.org/10.1371/journal.pone.0301991.t002

Table 3 . Prevalence ratio estimated from the annual prevalence of atrial fibrillation among all adults vs. adults with a recent ischemic stroke. Atrial Fibrillation Prevalence (%) All Adults Atrial Fibrillation Prevalence (%) Recent Ischemic Stroke Prevalence ratio*
*Prevalence ratio is calculated as the prevalence among the population of patients with a recent stroke divided by the prevalence among the adult population, overall and within each stratification.https://doi.org/10.1371/journal.pone.0301991.t003

Atrial Fibrillation Period Prevalence (%) All Adults Atrial Fibrillation Period Prevalence (%) Recent Ischemic Stroke Prevalence ratio* CHA 2 DS 2 -VASc = 0
*Prevalence ratio is calculated as the prevalence among the population of patients with a recent stroke divided by the prevalence among all adult population, overall and within each stratification.Abbreviations: HCRU Healthcare Resource Utilization https://doi.org/10.1371/journal.pone.0301991.t005